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Abstract 

We propose a new class of estimators of the multivariate response linear regression coefficient 
matrix that exploits the assumption that the response and predictors have a joint multivariate 
Normal distribution. This allows us to indirectly estimate the regression coefficient matrix 
through shrinkage estimation of the parameters of the inverse regression, or the conditional 
distribution of the predictors given the responses. We establish a convergence rate bound 
for estimators in our class and we study two examples. The first example estimator exploits 
an assumption that the inverse regression’s coefficient matrix is sparse. The second example 
estimator exploits an assumption that the inverse regression’s coefficient matrix is rank deficient. 
These estimators do not require the popular assumption that the forward regression coefficient 
matrix is sparse or has small Frobenius norm. Using simulation studies, we show that our 
example estimators outperform relevant competitors for some data generating models. 


1 Introduction 

Some statistical applications require the modeling of a multivariate response. Let yi € be the 
measurement of the ( 7 -variate response for the ith subject and let Xi G be the nonrandom values 
of the p predictors for the Ah subject {i = 1,... ,n). The multivariate response linear regression 
model assumes that yi is a realization of the random vector 

Yi = + P'^Xi + Si, i = l,...,n, ( 1 ) 

where /r* € is the unknown intercept, /3* is the unknown p hy q regression coefficient matrix, 
and £ 1 ,... jSn are independent copies of a mean zero random vector with covariance matrix 
The ordinary least squares estimator of /3* is 

/3(ols) ^ ||Y _ x/3||2,, (2) 

where || • ||i? is the Frobenius norm, is the set of real valued phy q matrices, Y is the nhy q 
matrix with zth row {Yi — n~^ EILi ^)^ ^ is the n by p matrix with Ah row (xj — EILi ^i)' 

(i = 1,...,n). It is well known that is the maximum likelihood estimator of /3* when si,... ,£n 

are independent and identically distributed Nq{0, and the corresponding maximum likelihood 
estimator of exists. 
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Many shrinkage estimators of /3* have been proposed by penalizing the optimizati on in ([2l) . Some 


2005: 

Obozinski et ah. 

20 ld; 

Pene et ahl. boioll 

(Yuan et ah. 

2007: Chen and Huane. 2012lh 


and remove irrelevant predictors (jTurlach et al 


Others encourage an estimator of reduced rank 


,en are independent and identically distributed A^g(0,S* 


Under the restriction that ei, 
shrinkage estimators of j3: 

have been proposed. These methods sim ultaneously estimate 0-^ and S.7p. Exam ples include maxi¬ 


mum 


q\\j, 

that penalize or constrain the minimization of the negative loglikelihood 
likelihood red uced rank re. re.slon and Vetj .B. envelope models 


(Cook et al 


2 OI 0 I: Su and Cook . 2011. 201‘ji. |2013ll. and multivariate regressio n with covariance 

id, 


estimation ( Rothman et ah . 2010 : Lee and Li ul. 12012 ! : iBhadra and Malli^ , I 2 OLI ) 


To fit ([I]) with these shrinkage estimators, one exploits explicit assumptions about /?*, but these 
may be unreasonable in some applications. As an alternative, we propose an indirect method to 
fit ([I]) without making explicit assumptions about /3*. We exploit the assumption that response 
and predictors have a joint multivariate Normal distribution and we employ shrinkage estimators 
of the parameters of the conditional distribution of the predictors given the response. Our method 
provides an alternative indirect estimator of /?*, which may be suitable when the existing shrinkage 
estimators are inadequate. 


2 A new class of indirect estimators of ( 3 ^ 


2.1 Class definition 

We assume that the measured predictor and response pairs (xi, yi),..., {xn, Un) are a realization of 
n independent copies of {X,Y), where {X',Y')' ~ Ap_|_q(/x*, S*). We also assume that E* positive 
definite. Define the marginal parameters through the following partitions: 


y* — 


f lJ-*x 
V 




'^*XY 
^'*XY ^*YY 


Our goal is to estimate the multivariate regression coefficient matrix = T,^^^T,^xy in the forward 
regression model 

{Y\X = x) ~ Nq{fi^Y + /3l(x - 

without assuming that /3* is sparse or that ||/?*|||' is small. To do this we will estimate the inverse 
regression’s coefficient matrix y* = and the inverse regression’s error precision matrix 

in the inverse regression model 


{X\Y = y) ~ +r]'*iy- y*Y), A*). 


We connect the parameters of the inverse regression model to /?* with the following proposition. 
Proposition 1. IfT^, is positive definite, then 

a, = A,-s: {Kyy+■ (3) 

We prove Proposition [1] in Appendix lA.ll This result leads us to propose a class of estimators 
of /3* defined by 

= A-^fj'YYy + (4) 
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where fj, A, and Syy are user-selected estimators of A*, and S*yy. If n > max(p, g) and the 
ordinary sample estimators are used for r), A and Syy, then /3 is equivalent to 

We propose to use shrinkage estimators of r/*, A“^, and S“yy in ([1]). This gives us the potential 
to indirectly fit an unparsimonious forward regression model by fitting a parsimonious inverse 
regression model. For example, suppose that 77* and A“^ are sparse, but /3* is dense. To fit the 
inverse regression model, we could use any of the forward regression shrinkage estimators discussed 
in Section [TJ 


2.2 Related work 


Lee and Liul ((2013) proposed an estimator of /3* that also exploits the assumption that (X', Y')' is 
multivariate Normal; however, unlike our approach that makes no explicit assumptions about /?*, 
their approach assumes that both and /3* are sparse. 

Modeling the inverse regression is a well-known idea in multivariate analysis. For example, when 
Y is categorical, quadratic discriminant analysis models {X\Y = y) as p-variate Normal. There 
are also many examples of rnodelin g the inverse regression in the sufficient dimension reduction 
literature ( Adraeni and Cook . 2009I L 


The most closely related work to ours is that by Cook et al. (2oH). They proposed indirect 
estimators of /?* based on modeling the inverse regression in the special case when the response is 
univariate, i .e. q = 1. Under the same multivariate Normal assumption on (X', Y')' that we make. 
Cook et al. ( 201, 'll ) showed that 


/3* = 


A^ 


1 + Sl 


*XY 


A* /^*YY 


(5) 


They proposed estimators of /?* by replacing S*xY and S*yy in the right hand side of ([5]) with their 
usual sample estimators, and by replacing A“^ with a shrinkage estimator. This class of estimators 
was designed to exploit an abundant signal rate in the forward univariate response regression when 
p > n. 


3 Asymptotic Analysis 


We present a convergence rate bound for the indirect estimator of /3* defined by (jl]). Our bound 
allows p and q to grow with the sample size n. In the following proposition, || • || is the spectral 
norm and <Pmin(‘) is the minimum eigenvalue. 

Proposition 2. Suppose that following conditions are true: (i) S* is positive definite for allp + q; 
(a) the estimator Syy is positive definite for all q; (Hi) the estimator A~^ is positive definite 
for all p; (iv) there exists a positive constant K such that (pniin(U~yy) > K for all q; and (v) 
there exist sequences {an},{bn} and {cn} such that \\fj — rj*\\ = Op{an), ||A“^ — A“^|| = Op{hn), 
||Eyy — S“yy|| = Op{cn), uud an||7?*|| ' ||Al^|| + 11|^ + Cn ^ 0 as H ^ oo. Then 


11/3 - /3*|| = Op (onllp* 


lA 


- 1||2 


+ hr, 


lA 


-ii 


T Cjj 


lA 


-ii 


We prove Proposition [2] in Appendix lA.ll We used the spectral norm because it is com- 
patible with the convergence rate bounds established for sparse inverse covariance estimators 
( Rothman et ah . 20081 : Lam and Fan . 20091 : Ravikumar et ah . 2011 1. 


3 






































If the inverse regression is parsimonious in the sense that ||??*|| and ||A“^|| are bounded, then the 
bound in Proposition [2] simplifies to ||/3 —/S*|| = Op{an + bn + Cn)- Prom an asymptotic perspective, 
it is not surprising that the indirect estimator of /?* is only as good as its worst plug-in estimator. 
We explore finite sample performance in Section [5l 


4 Example estimators in our class 


4.1 Sparse inverse regression 

We now describe an estimator of the forward regression coefficient matrix /?* defined by dl]) that 
exploits zeros in the inverse regression’s coefficient matrix ry*, zeros in the inverse regression’s error 
precision matrix A 
with 


-1 


and zeros in the precision matrix of the responses S^yy. We estimate rj, 


= arg min 


imj I 


( 6 ) 


i=i 

which separates into p Li-penalized least-squares regressions ( Tibshirani . 1996l l: the first predictor 
regressed on the response through the pth predictor regressed on the response. We select Xj with 
5-fold cross-validation, minimizing squared prediction error totaled over the folds, in the regression 
of the jth predictor on the response (j = 1,... ,p). This allows us to estimate the columns of ??* in 
parallel. 

We estimate A“^ and SJyy with Li-penalized Normal likelihood precision matrix estimation 


( Yuan and Lin . 2007 : Baneriee et ah . 2008l b Let be a generic version of this estimator with 


tuning parameter 7 and input p hy p sample covariance matrix S: 


S ^ = arg min 


tr(n5) -log| 0 | 


j¥=k 


(7) 


where Sy is the set of symmetric and positive definite phy p matrice s. The r e are many algorithms 


that solve Two good choic es are the graphica l lasso algorithm (jYuanl . l2008l : iFriedman et al. l 


20081 ) and the QUIC algorithm ( Hsieh et ah . 2011). We sel ect 7 with 5-fold cross-validation maxi¬ 
mizing a validation likelihood criterion ( Huang et ah . 20061 ) : 


7 = arg mm 

k=i 


0 

Y{ 


tr S 


^-1 

"7,S(. 




- log 


^-1 






( 8 ) 


where Q is a, user-selected finite subset of the non-negative real line, is the sample covariance 

matrix from the observations outside the /cth fold, and S'(fc) is the sample covariance matrix from the 
observations in the kth. fold centered by the sample mean of the observations outside the /cth fold. 
We estimate A“^ using ([7|) with its tuning parameter selected by ([ 8 ]) and 5 = (X — Yfy^^)'(X — 
Yri^^)/n. Similarly, we estimate S“yy using ([7]) with its tuning parameter selected by ([ 8 ]) and 
S = Y'Y/n. 
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4.2 Reduced rank inverse regression 

We propose indirect estimators of /3* that exploit the assumption that the inverse regression’s 
coefficient matrix rj^ is rank deficient. We have the following simple proposition that links rank 
deficiency in t/* and its estimator to /3* and its indirect estimator. 

Proposition 3. //S* is positive definite, then rank(/3*) = rank(? 7 ^,). In addition, i/Syy and 
are positive definite in the indirect estimator (5 defined by then rank(/3) = rank(? 7 ). 

The proof of this proposition is simple so we excluded it to save space. 

We propose the following two example reduced rank indirect estimators of /?*: 

1. Estimate with Y'Y/n and estimate with Normal likelihood reduced rank 

inverse regression: 

= arg min [n“^tr |(X — Y? 7 )'(X — Yr/)n} — logdet(n)] (9) 

subject to rank(r 7 ) = r, 


where r is selec ted from {0,.... rninfjp. q )|. The solution to the optimization in 
in closed form ( Reinsel and Vein . 1998l l. 


is available 


2. Estimate ry* with defined in Q, estimate S^yy with ([7]) using S = Y'Y/n, and estimate 
A“^ with d?!) using 5 = (X — Y?y(’'))'(X — Yiy^^^j/re. 

Both example indirect reduced rank estimators of /3* are formed by plugging in the estimators of 
ry*, A“^, and E*yy to ([4]). The first estimator is likelihood-based and the second estimator exploits 
sparsity in E^yy and A~^. Neither estimator is defined when min(p, q) > n. In this case, which we 
do not address, a regularized reduced rank estimator of could be used instead o f the estimator 
defined in ([9]), e.g. the factor est imation and selection estim ator (jYuan et al.l . 120071 1 or the reduced 
rank ridge regression estimator (iMukheriee and Zhd . 120111 ) . 


5 Simulations 

5.1 Sparse inverse regression simulation 

We compared the following indirect estimators of /?* when the inverse regression’s coefficient matrix 
ry* is sparse: 

Ili- This is the indirect estimator proposed in Section STJ 

Is- This is an indirect estimator defined by (jlj) with fy defined by Q, Eyy = Y'Y/n, and 
A = (X - Yry^i)'(X - Yri^^)/n. 

Oa . This is a part oracle indirect estimator defined by (jH) with fj defined by Q , Eyy defined by 
d?]), and A~^ = A“^. 

O. This is a part oracle indirect estimator defined by (jl]) with ry defined by ([6]), E^y = E^yy, 
and A“^ = A~^. 
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Oy. This is a part oracle indirect estimator defined by (jl|) with rj defined by ([6]), ^yy = 
and defined by ([7]). 

We also included the following forward regression estimators of /3*: 

OLS/MP. This is the ordinary least squares estimator defined by arg min^gjjpx,j ||Y —X/3|||.. When 
n < p, we use the solution X“Y, where X“ is the Moore-Penrose generalized inverse of X. 

R. This is the ridge penalized least squares estimator defined by 


arg min (||Y — X/3|||n + 


D- 


l^- This is an alternative ridge penalized least squares estimator defined by 


arg mm 


+ ^ Am ^ 


m=l 





where a separate tuning parameter is used for each response. 

We selected the tuning parameters for uses of Q with 5-fold cross-validation, minimizing vali¬ 
dation prediction error on the inverse regression. Tuning parameters for £2 and R were selected 
with 5-fold cross-validation, minimizing validation prediction error on the forward regression. We 
selected tuning parameters for uses of ([7]) with ([8]). The candidate set of tuning parameters was 

{10-8, 10-7-5,..., 107-5,108}. 

For 50 independent replications, we generated a realization of n independent copies of (X', Y')', 
where Y ~ Xq(0, and {X\Y = y) ~ Np{rj'^^y, A*). The (i, j)th entry of was set to py 

and the (i, j)th entry of A* was set to We set = Z o A, where o denotes the element-wise 

product: Z had entries independently drawn from A(0,1) and A had entries independently drawn 
from the Bernoulli distribution with nonzero probability s*. This model is ideal for J^i because 
Af^ and Y~yy are both sparse. Every entry in the corresponding randomly generated /3* is nonzero 
with high probability, but the magnitudes of these entries are small. This motivated us to compare 
our indirect estimators of /3* to the ridge-penalized least squares forward regression estimators R 
and £ 2 - 

We evaluated performance with model error ( Breiman and Friedman . 19971 : Yuan et ah . 2007 ). 
which is defined by — /3*)|||^. 

We report the average model errors, based on these 50 replications, in Tabled! When s* = 0.1, 
the indirect estimators defined by ([3]) performed well for all choices of py and p^. Our proposed 
estimator In was competitive with other indirect estimators also defined by ([4]), even those that 
used some oracle information. As s* increased with py = 0.7 and pA = 0.9 fixed, the forward 
regression estimators performed nearly as well as In- 

Similarly, Table [2| shows that when s* = 0.1, In outperforms all three forward regression 
estimators. However, unlike in the lower dimensional setting illustrated in tabled! when p* is not 
sparse, i.e. s* > .3, In is outperformed by forward regression approaches. The part oracle method 
Oy that used the knowledge of E-yy outperformed the other two part oracle indirect estimators O 
and Oa when pA = -9. Also, when pA = .9, In was competitive with the part oracle estimators. 
Taken together, the results in Tables dl and [2l suggest that when p* is very sparse, our proposed 
indirect estimator In may perform nearly as well as the part oracle indirect estimators and the 
forward regression estimators. 
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Table 1: Averages of model error from 50 replications when n = 100, p = 20, and q = 20. All 
standard errors were less than or equal to 0.05. 


PY 

PA 

5 ^: 

III 

O 

Oa 

Oy 

Is 

OLS 


R 

0.7 

0.0 

0.1 

0.61 

0.32 

0.53 

0.40 

1.35 

2.10 

1.23 

1.22 

0.7 

0.5 

0.1 

0.72 

0.39 

0.59 

0.51 

1.30 

1.91 

1.29 

1.30 

0.7 

0.7 

0.1 

0.76 

0.45 

0.65 

0.56 

1.27 

1.73 

1.27 

1.29 

0.7 

0.9 

0.1 

0.83 

0.66 

0.85 

0.64 

1.26 

1.35 

1.05 

1.09 

0.0 

0.9 

0.1 

0.81 

0.87 

0.87 

0.79 

2.04 

2.34 

1.26 

1.87 

0.5 

0.9 

0.1 

0.96 

0.76 

0.99 

0.74 

1.63 

1.84 

1.36 

1.49 

0.9 

0.9 

0.1 

0.46 

0.39 

0.47 

0.36 

0.63 

0.62 

0.48 

0.48 

0.7 

0.9 

0.3 

0.60 

0.53 

0.65 

0.46 

0.83 

0.67 

0.64 

0.63 

0.7 

0.9 

0.5 

0.48 

0.37 

0.48 

0.37 

0.65 

0.53 

0.52 

0.51 

0.7 

0.9 

0.7 

0.42 

0.29 

0.39 

0.31 

0.55 

0.46 

0.45 

0.44 


Table 2: Averages of model error from 50 replications when re = 50, p = 60, and g = 60. All 
standard errors were 0.69 or less, except for MP, which had standard errors between 0.77 and 3.16. 


PY 

PA 


III 

0 

Oa 

Oy 

MP 

h 

R 

0.7 

0.0 

0.1 

8.59 

4.28 

5.70 

7.40 

78.33 

13.85 

12.44 

0.7 

0.5 

0.1 

9.67 

5.09 

6.37 

8.49 

73.82 

14.79 

13.34 

0.7 

0.7 

0.1 

10.01 

6.37 

7.44 

8.75 

70.30 

15.56 

14.40 

0.7 

0.9 

0.1 

9.92 

10.07 

11.44 

8.88 

61.83 

16.43 

15.94 

0.0 

0.9 

0.1 

15.17 

17.09 

16.93 

15.23 

119.60 

28.63 

29.41 

0.5 

0.9 

0.1 

14.88 

13.59 

16.91 

12.01 

86.88 

23.62 

22.69 

0.9 

0.9 

0.1 

4.71 

4.78 

5.94 

3.99 

25.37 

6.36 

5.91 

0.7 

0.9 

0.3 

16.86 

17.43 

19.66 

15.44 

43.88 

15.30 

14.14 

0.7 

0.9 

0.5 

26.89 

26.81 

29.93 

24.95 

36.87 

14.79 

13.62 

0.7 

0.9 

0.7 

31.86 

35.98 

38.64 

30.36 

33.58 

14.35 

13.65 


5.2 Reduced rank inverse regression simulation 

We compared the performance of the following indirect reduced rank estimators of /?*: 

(t’) rr-m 

^ML- This is the likelihood-based indirect example estimator 1 proposed in Section \47M 

jT). This is the indirect example estimator 2 proposed in Section [4.21 which uses sparse estimators 
of ^~YY in (g]). 

OT). This is a part oracle indirect estimator defined by (g]) with r) defined by Q, = Ap^, and 

y-l _ 

Zjyy — ^^YY- 

. This is a part oracle indirect estimator defined by g]) with fj defined by Q, A“^ defined by 
gD, and Eyy = 

This is a part oracle indirect estimator dehned by g]) with r) defined by ([9]), A~^ = A“^, 
A“^ defined by ([7]), and dehned by d?]). 
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Table 3: Averages of model error from 50 replications when n = 100, p = 20, and q = 20. All 
standard errors were less than or equal to 0.05. 


Pv 

PA 

n 

jh) 

Oh) 

^A 

oh) 

l^Y 

rir) 

^ML 

OLS 

RR 

0.7 

0.0 

10 

0.33 

0.04 

0.86 

0.75 

0.64 

1.38 

0.64 

0.7 

0.5 

10 

0.34 

0.04 

0.86 

0.74 

0.60 

1.31 

0.60 

0.7 

0.7 

10 

0.31 

0.03 

0.86 

0.80 

0.62 

1.32 

0.61 

0.7 

0.9 

10 

0.31 

0.02 

0.85 

0.88 

0.60 

1.30 

0.61 

0.0 

0.9 

10 

0.15 

0.03 

1.00 

1.77 

1.22 

2.61 

1.21 

0.5 

0.9 

10 

0.42 

0.01 

1.11 

1.36 

0.90 

1.97 

0.89 

0.9 

0.9 

10 

0.12 

0.01 

0.32 

0.30 

0.22 

0.46 

0.22 

0.7 

0.9 

4 

0.35 

0.02 

1.73 

2.61 

0.49 

3.12 

0.49 

0.7 

0.9 

8 

0.35 

0.01 

1.15 

1.33 

0.68 

1.73 

0.65 

0.7 

0.9 

12 

0.31 

0.04 

0.64 

0.59 

0.55 

0.96 

0.53 

0.7 

0.9 

16 

0.25 

0.08 

0.30 

0.20 

0.44 

0.50 

0.42 


We compared these indirect estimators to the following forward reduced rank regression estimator: 

RR. This is the likelihood based reduced rank regression ( Izenman . I1975I : iReinsel and VehJ . Il 998 l ) . 
The estimator of /3* and the estimator of the forward regression’s error precision matrix 
are dehned by 


= arg min [n ^tr {(Y — X/3)'(Y — X/3)n} — logdet(n)] 
(/3,n)eiRp><'3xS^ 

subject to rank(/3) = r. 


We selected the rank parameter r for uses of Q with 5-fold cross-validation, minimizing vali¬ 
dation prediction error on the inverse regression. The rank parameter for RR was selected with 
5-fold cross-validation, minimizing validation prediction error on the forward regression. We se¬ 
lected tuning parameters for uses of ([7|) with ([8]). The candidate set of tuning parameters was 

{10-8,10-7-5,..., 108}. 

For 50 independent replications, we generated a realization of n independent copies of {X' ,Y')' 
where Y ~ A^g(0, and {X\Y = y) ~ Np^rj'^y, . The (i,j)th entry of was set 

to Py the (i,y)th entry of A* was set to After specifying r* < min(p, g), we set 

rj* = PQ, where P G and Q E had entries independently drawn from X(0,1) so that 

r* = rank(p*) = rank(/3*). As we did in the simulation in Section [5.11 we measured performance 
with model error. 

We report the model errors, averaged over the 50 independent replications, in Tabled Under 
every setting, outperformed all non-oracle competitors. When r* < 12, outperformed both 
and Oy, which suggests that shrinkage estimation of and X~yy was helpful. In each 

(p) 

setting, performed similarly to PR even though they are estimating parameters of different 
condition distributions. 





















5.3 Reduced rank forward regression simnlation 

Our simulation studies in the previous sections used inverse regression data generating models. In 
this section, we compare the estimators from Section [5.21 using a forward regression data generating 
model. 

For 50 independent replications, we generated a realization of n independent copies of {X', Y')' 
where X ~ A^p(0, and {Y\X = x) ~ Nq{l3'^x,Y^,E)- The (z,j)th entry of was set 

to and the (i,j)th entry of was set to After specifying r* < min(p, g), we set 

/?* = ZQ where Z G had entries independently drawn from A^(0,1) and Q G had 

entries independently drawn from Uniform(—1/4,1/4). In this data generating model, neither 
nor Z,~YY had entries equal to zero. 


Table 4: Averages of model error from 50 replications when n = 100, p = 20, and q = 20. All 
standard errors were less than or equal to 0.21. 


px 

PE 

r* 

/h) 

oh) 

Qh) 

Qh) 

CZy 

^ML 

OLS 

RR 

0.0 

0.9 

10 

2.79 

0.54 

4.27 

5.05 

2.48 

4.99 

2.82 

0.5 

0.9 

10 

2.90 

0.47 

5.36 

5.94 

2.73 

5.00 

2.89 

0.7 

0.9 

10 

2.97 

0.51 

4.64 

5.03 

2.71 

4.93 

2.76 

0.9 

0.9 

10 

2.84 

0.73 

3.78 

4.16 

2.67 

5.19 

2.73 

0.7 

0.0 

10 

4.66 

1.92 

3.59 

5.88 

4.53 

5.11 

4.34 

0.7 

0.5 

10 

4.27 

1.65 

3.88 

5.51 

3.99 

5.06 

3.97 

0.7 

0.7 

10 

3.55 

1.26 

3.99 

5.29 

3.43 

5.00 

3.44 

0.7 

0.9 

4 

1.27 

0.08 

3.84 

4.71 

0.95 

5.00 

1.11 

0.7 

0.9 

8 

2.39 

0.36 

4.15 

5.15 

2.05 

4.81 

2.22 

0.7 

0.9 

12 

3.58 

0.79 

4.44 

5.21 

3.20 

5.15 

3.27 

0.7 

0.9 

16 

4.53 

1.29 

4.62 

4.42 

4.33 

5.11 

4.38 


The model errors, averaged over the 50 replications, are reported in Table [H Both jh) and 
were competitive with RR in most settings. Although neither nor S/yy were sparse, we 
again see that jh) generally outperforms Oy^ and both of which use some oracle information. 
These results indicate that shrinkage estimators of A“^ and B/yy in ([4]) are helpful when neither 
is sparse. 


6 Tobacco chemical composition data example 


As an exam ple application, we use the chemical composition of tobacco leaves data from lAnderson and Bancroft 
( 1952l l and Izenman ( 2009l l . These data have n = 2 5 cases, y = 6 p redictors, and (7 = 3 responses. 

The names of the predictors, taken from page 183 of llzenmani (120091 1 , are percent nitrogen, percent 
chlorine, percent potassium, percent phosphorus, percent ca lcium, and perc ent magnesium. The 
names of the response variables, also taken from page 183 of Izenman ( 2009l l. are rate of cigarette 
burn in inches per 1,000 seconds, percent sugar in the leaf, and percent nicotine in the leaf. In 
these data, it may inappropriate to assume that A^^ is sparse. For this reason, we consider another 
example indirect estimator of /3* called Il 2 that estimates r/* with ([6]), estimates S/yy with ([7]) 
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using S = Y'Y/n, and estimates ^ with 


arg min 


tr(r 2 S') — log det (n) + 7 \i^jkY 


j,k 


( 10 ) 


where S = (Y — 'Kfi^^) ' (Y — X r?^^)/n. We compute (1101) with the closed form solution derived by 
Witten and Tibshirani ( 20091 ) . As before, we select 7 from { 10 ®, 10 , lO”^'®, 10 ®} using 


We also consider the forward regression estimators RR, £ 2 , and OLS defined in Section 15.11 and 
Section [521 We introduce another competitor ii, defined as 


arg min 

/3eRpx<i 


||Y-X/3||| + ^A,f]|/3,,| 


i=i 1=1 


which is equivalent to performing q separate lasso regressions ( Tibshirani . 19961 ). We randomly 
split the data into a 40% test set and 60% training set in each of 500 replications and we mea¬ 
sured the squared prediction error on the test set. All tuning parameters were chosen from 
{10®, 10“^-®,..., 10^-®, 10®} by 5-fold cross validation. 


Table 5: Averages of squared prediction error, with standard errors in parenthesis, for each response 
variable from 500 replications. 




Ili 

Il2 

OLS 

RR 

£2 

£1 

Rate of burn 

1.19 

1.33 

0.45 

2.96 

2.17 

0.57 

1.55 


(0.08) 

( 0 . 10 ) 

(0.03) 

(0.15) 

(0.15) 

(0.07) 

(0.13) 

Percent sugar 

442.38 

347.76 

235.55 

799.03 

605.30 

365.13 

583.98 


(17.97) 

(21.31) 

(6.31) 

(29.45) 

(25.52) 

( 20 . 68 ) 

(24.36) 

Percent nicotene 

2.55 

2.54 

0.79 

5.65 

4.59 

0.81 

2.82 


(0.29) 

(0.30) 

(0.05) 

(0.41) 

(0.31) 

( 0 . 21 ) 

(0.29) 


Table [5] shows squared prediction errors, averaged over the 10 predictions and the 500 replica¬ 
tions. These results indicate that 1 12 outperforms all the competitors we considered. Also, Ili was 
outperformed by £ 2 , but was competitive with separate lasso regressions. Reduced rank regression 
was not competitive with the proposed indirect estimators. 
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A Appendix 


A.l Proofs 

Proof of Proposition [IJ Since S* is positive definite, we apply the partitioned inverse formula to 
obtain that 

^-1 ^ ( ^*xx V' = r \ 

* V nxy J V Ke ) ’ 

where A* = - '^*XY^fYY'^'*XY = '^*yy - The symmetry of T.~^ 

implies that /3*S~£ = {p^A~^y so 

/3, = (11) 

Using the Woodbury identity, 

= (S*yy - KxY^fxx^*xY)~^ 

= ^*yy + ^*yy^*xy (^*xx “ ^*^’i^^*yy^*xy) '^ xy'^^yy 

~ ^*yy T ^?7^. (12) 

Using the inverse of the expression above in (jlll) establishes the result. □ 

In our proof of Proposition [21 we use the matrix inequality 


p(l)^(2)^(3) _ _b(1)_b( 2)5(3)|| < p(i) _ ^(i)|| JJ ||_g(fe)|| 

j=l k^j 

+E n +n 11"'“ - (13) 

j=i k^j j=i 


Bickel and Levin^ ( 20081 ) used (fid]) to prove their Theorem 3. 


Proof of Proposition\^ From (fT2]l in the proof of Proposition [H = E^yy + r/*A^ ^r/(. Define 
= ^yy + Applying ([l3]), 


11/3 - (34 =\\A-^’tE - A,-Ve*e|| 

<||A"^ - A“^|| • ||?7*|| • IIS^eII + 11?) - ?y*|| • ||A“^|| • ||S*e|| + ||Ee - S*£;|| • ||A7^|| • ||77*|| 
+ l|A^^|| • ||?7 — ??*|| • IIE^; — S*e|| + ||^/*|| • ||A“^ — A“^|| • ||Se — S*£;|| 

+ ||S*e|| • ||A-i - A;1|| • \\fj -p 4 + \\fj - p4 ■ ||A-^ - A^^ll • IlSij - S*e||. (14) 


We will show that the third term in (I14h domiii a tes th e others. We continue by deriving its bound. 


Employing a matrix identity used bv ICai et al. ( 201Cll h we write Eg - T.^e = E*e(E^^ - S^^)Ee, 


so 


||Ee-E*£;|| < IISeII • ||E,£;| 


ly-l y-ll 
\^E ^*e\ 


(15) 
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Using the triangle inequality and (fT3]) . 

II^E ~ — ll^yy ~ ^*yyll + 11^^ ^^*11 

< ||Sy^ - +2||??-7?*|| • IIA^^II • ||r/*|| + ||A"^ - A^^j] • ||7y*f 

+ 2||?/*|| • ||A-i - A-i|| • ||57 - r]4 + IIA^^II • \\rj - r/^lp + \\rj - v*f\\A-^ - Aj^H 
= Op {cn + an\\v*\\ ■ ||A"^|| + bnWrj^f) . (16) 

Since v9min(S~yy) > K and A“^ is positive definite, Weyl’s eigenvalue inequality implies that 

9^min(S*£;) ^ K SO 

||s*E|| = <^-[js;i)<i/i^. (17) 

Also, 

||Se||=^-i„(S^1) = Op(1). (18) 

because (^min(51~p) > K, Sp is positive definite, and On||77*|| • ||A7^|| +6n||?/*|P + = o(l) in (fT6]l . 

Using (fT^ . (fTTl) . and (fTSl) . in (fT5]l . 

||Ep - I]*p|| = Op {an\\rj*\\ ■ ||A“^|| + bn\\'n*\\'^ + Cn) ■ 

We then see that the third term in ()14p dominates and 

11/3 - /3*|| = Op{{an\\r]^\\ ■ ||A"^|| + 6„||r/*|p + c„) ||r/*||||A“^||} 

= Op (anilry.f ||A;if + ||A;i|| + c„||r/,|| • ||A;i||) . 

□ 
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